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1.  INTRODUCTION 


Most,  if  not  all,  cancer  cells  proliferate  much  faster  than  normal  cells.  Thus,  studying  how  cancer 
cells  proliferate  faster  than  normal  cells  is  a  key  in  understanding  cancer  pathogenesis.  The 
mammalian  target  of  rapamycin  (mTOR)  pathway  is  a  cellular  pathway  that  controls  cell 
proliferation  and  this  pathway  is  commonly  deregulated  in  many  cancers.  In  addition,  Tuberous 
Sclerosis  Complex  (TSC)  negatively  regulates  the  mTOR  pathway.  Therefore,  studying  the  role 
of  mTOR  pathway  in  cell  proliferation  is  important  to  understand  the  pathogenic  mechanism  by 
TSC.  When  cells  are  activated  to  proliferate,  the  first  thing  they  do  is  producing  a  lot  of  proteins. 
To  make  more  proteins  in  cells,  they  need  to  make  more  messenger  RNAs  (mRNAs)  from  DNA. 
The  whole  procedure  is  called  gene  expression  and  mRNA  is  a  key  molecule  in  this  procedure. 
Thus,  the  questions  of  how  mRNAs  are  made  and  how  they  are  regulated  in  cancer 
mechanisms  are  important  questions  to  ask  to  understand  cancer  at  a  molecular  level. 

Generally,  mRNA  undergoes  very  complicated  process  to  make  it  competent  for  protein 
synthesis  in  cells.  Recently,  we  discovered  a  pervasive  production  of  truncated  mRNAs  when 
mTOR  is  activated  in  cells.  The  truncated  mRNAs  are  produced  by  dysregulation  of  one  of  the 
steps  during  mRNA  synthesis  in  cells.  The  cellular  consequence  of  this  phenomenon  is  the 
production  of  truncated  proteins.  Usually,  fundamental  elements  of  many  proteins  are  consisted 
of  catalytically  active  domains  and  regulatory  domains.  The  active  domain  represents  the 
function  of  a  protein  and  the  regulatory  domain  is  a  platform  for  fine-tuning  of  the  protein  activity 
regulated  by  other  cellular  proteins.  Interestingly,  many  truncated  proteins  produced  by  mTOR 
activation  were  lacking  the  regulatory  or  catalytic  domain.  This  suggests  that  mTOR  activation 
produces  many  deregulated  “super  isoform”  proteins  by  truncation  and  this  could  be  a  driver  to 
fast  cell  proliferation  and  cancer  initiation  at  a  molecular  level.  Our  goals  in  this  proposal  are  to 
find  them  and  understand  their  function  in  cell  proliferation  using  a  series  of  experiments 
employing  high  profiling  technologies  including  next  generation  sequencing  and  multi¬ 
dimensional  targeted  LC-MS/MS.  More  importantly,  we  will  narrow  down  the  list  of  truncated 
mRNAs  that  are  crucial  for  cell  proliferation.  The  identified  truncated  mRNAs  will  be  new  targets 
in  cancer-related  research  and  provide  brand  new  molecules  that  function  as  a  driver  in  TSC- 
related  pathogenesis. 
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3.  ACOMPLISHMENTS 


What  were  the  major  goals  of  the  project? 

As  outlined  below,  the  major  goals  of  the  projects  during  the  first  funding  period  were  to  identify 
mTOR-responsive  truncated  mRNA  and  protein  isoforms  using  the  combination  of  high  profiling 
technologies  and  bioinformatics  tools.  The  specific  aim  1  is  divided  into  two  major  tasks  which 
will  cover  the  topic  of  functional  transcriptome  and  proteome.  Each  major  task  contains  a  series 
of  experiments  and  goals  to  achieve,  which  are  listed  under  the  subheading  of  each  major  task. 

Aim  1:  Identify  mTOR-responsive  truncated  mRNA  and  protein  isoforms 

Major  task  1.1:  Profiling  of  mTOR-responsive  truncated  mRNAs 

PacBio  Iso-seq  (months  1-3):  Minor  changes  in  this  approach.  Cost-matching  alternatives  to  this 
approach  were  taken  and  now  completed. 

RNA-seq  in  the  presence  of  Torinl  (months  1-3):  Completed. 

Assembly  of  Isoform  database  and  integration  of  RNA-seq  with  Iso-seq  (months  4-6):  With 

minor  modifications,  this  approach  is  complete. 

Major  task  1 .2:  Development  of  targeted  mass  spectrometry  for  truncated  proteins 

In  silico  C-terminal  database  assembly  (months  7-8):  Completed. 

Targeted  Mass  spectrometry  (months  9-12):  Completed. 


What  was  accomplished  under  these  goals? 

An  integrative  method,  IntMAP,  to  profile  alternative  polyadenylation: 

We  developed  a  novel  algorithm  called  IntMAP 
(Inte grative  Model  for  Alternative  Polyadenylation), 
which  integrates  RNA-seq  and  3’-end-seq  data  for 
exhaustive  analysis  of  Alternative  cleavage  and 
polyadenylation  (APA)  events  (Figure  1).  In  IntMAP, 
first  the  position  of  multiple  polyadenylation  sites  in  a 
gene  is  defined  and  the  3’-UTR  isoforms  of  the  gene 
are  accordingly  deduced.  Then,  the  quantitative 
information  of  RNA-seq  and  3’-end-seq  data  is 
integrated  to  calculate  the  expression  level  of  inferred 
3’-UTR  isoforms.  Two  elements  in  IntMAP  work 
systemically  to  help  the  quantitation  of  isoform 
expression.  The  first  element  promotes  the  isoform 
expression  to  comply  with  the  observed  read  counts 
from  RNA-Seq  data.  The  second  element  encourages 
the  consistency  between  the  isoform  expression 
learned  from  RNA-Seq  and  3’-end-Seq  data.  After  the 
quantitation  by  IntMAP,  the  calculated  expression 
level  of  different  3’-UTR  isoforms  is  applied  to  the  chi- 
squared  test  to  determine  the  APA  event  of  a  gene  in 
a  biological  context  (Figure  1). 

Bipartite  expression  profile  of  truncated  mRNAs  upon 

the  cellular  mTOR  activity: 
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85.6 
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Chi-squared  Test 

I 


Comprehensive  profiling  of  APA 


Figure  1 .  The  development  of  IntMAP 
software  to  profile  alternative 
polyadenylation. 
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To  identify  APA  isoforms  that  could  be 
crucial  for  cell  proliferation,  we  first 
profiled  truncated  mRNAs  that  are 
differentially  expressed  upon  the 
changes  in  the  cellular  mTOR  activity. 

Consistent  with  our  findings  in  WT  and 
Tscl -/-  MEFs,  numerous  genes 
increased  or  decreased  their  truncated 
mRNA  isoform  expression  upon  the 
inhibition  of  mTOR  (Figure  2).  We 
consider  up-regulated  truncated  mRNAs 
in  the  presence  of  Torin  1  as  the  normal 
tissue-enriched  truncated  mRNAs  and 
down-regulated  truncated  mRNAs  in  the 
presence  of  Torin  1  as  the  TSC-related 
truncated  mRNAs.  The  profile  of 
differentially  expressed  truncated 
mRNAs  partially  overlapped  between  two 
comparison  datasets  (WT  vs.  TscT/_ 

MEFs,  TscT/_  mock  vs.  TscT/_  +Torin  1),  suggesting  that  truncated  mRNAs  might  present 
additional  molecular  signatures  defining  the  characteristics  of  TSC  biology. 

Integration  of  in  silico  proteogenomics  and  LC/MS-MS  to  survey  intron-coded  peptide  sequences: 

We  developed  a  new  workflow  to  identify  intron- 
coded  peptide  sequences  in  the  C-terminal  of 
truncated  proteins  (Figure  3).  The  original  C- 
terminomics  (from  a  published  protocol)  using 
chemical  blocks  did  not  work  out  in  our  lab.  An  SRM 
approach  for  targeted  LC-MS-MS  could  identify 
intron-coded  peptides  but  the  efficiency  was  low  (10 
proteins  identified  out  of  1 151  candidate  proteins). 

We  developed  a  new  method  integrating  in  silico 
proteogenomics  database,  heavy  biochemical 
fractionations  and  high  capacity  LC-MS/MS  to 
conduct  C-terminomics  (Figure  3).  In  this  approach, 
we  first  conduct  biochemical  fractionations  by  size 
and  use  each  fraction  for  tryptic  digestions  and  LC-MS/MS  experiments.  In  parallel,  in  silico 
peptide  sequence  database  is 
prepared  based  on  the  profile  of 
truncated  mRNAs.  High 
capacity  LC-MS/MS 

experiments  are  performed  and 
the  resulting  high  resolution 
MS/MS  spectra  are  matched  to  the  peptide  database.  The  matched  sequences  are  considered 
as  C-terminal  peptides  from  truncated  proteins.  As  shown  in  the  table  above,  we  were  able  to 
identify  779  matching  MS/MS  spectra  from  this  approach.  They  represent  165  unique  peptides 
coming  from  107  truncated  proteins.  Considering  1151  possible  truncated  proteins,  we  were  able 
to  validate  10%  of  them.  Previous  SRM  only  identified  1%  of  them. 


Total  number  of 
peptides  in  in  silico 
proteogenomics 
database 

Total  number 
of  MS/MS 

spectra 
matched 

Total 

number  of 

peptides 

identified 

Total 

number  of 

proteins 

identified 

1151 

779 

165 

107 

Matching  MS/MS  spectra 

\ 

Identification  of  peptides  in  the 
C-terminus  of  truncated  proteins 


Figure  3.  A  workflow  for  C-terminomics 


What  opportunities  for  training  and  professional  development  has  the  project  provided? 

Nothing  to  report. 
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4.  IMPACT 

What  was  the  impact  on  the  development  of  the  principal  discipline(s)  of  the  project? 

Development  of  IntMAP:  The  developed  software  is  designed  to  map  the  location  of 
polyadenylation  sites  in  a  transcriptome.  The  software  will  be  released  as  an  open  source  and 
the  user-friendly  version  of  it  can  be  downloaded  freely.  This  will  help  research  communities  of 
human  diseases  caused  by  the  malfunction  of  RNA  processing. 

Gene  profile  for  truncated  mRNA  and  protein  isoforms:  Our  data  provide  evidence  that 
truncated  mRNA  and  protein  isoforms  do  exist  in  cells.  More  importantly,  we  identified  truncated 
isoforms  whose  expressions  are  highly  dynamic  upon  the  changes  of  cellular  mTOR  activity. 
There  new  isoforms  relevant  to  mTOR  biology  could  be  important  for  TSC  pathogenesis.  The 
list  of  these  isoforms  will  help  understanding  TSC  pathogenesis  from  a  different  angle  since 
they  could  be  hidden  players  in  TSC  biology. 

In  silico  proteogenomics  database  and  the  workflow  for  C-terminomics:  We  have  developed  a 
new  workflow  for  mass  spectrometry-based  C-terminomics  to  identify  the  peptides  encoded 
from  intron  regions.  The  database  for  mouse  is  now  complete  and  available  for  the  research 
community.  In  addition,  a  systemic  approach  to  identify  those  peptides  using  mass  spectrometry 
is  also  available.  These  will  have  a  broad  impact  on  research  fields  from  proteomics  to 
genomics  to  pathobiology. 


What  was  the  impact  on  other  disciplines? 

Nothing  to  report. 

What  was  the  impact  on  technology  transfer? 

Nothing  to  report. 

What  was  the  impact  on  society  beyond  science  and  technology? 

Nothing  to  report. 
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5.  CHANGES/PROBLEMS 
Changes  in  approach  and  reasons  for  change 

One  experimental  approach  of  the  major  task  1 .1 ,  PacBio  Iso-seq,  was  changed  to  alternative 
approaches  including  the  3’-end  seq,  integration  of  two  sequencing  methods  and  the  resulting 
development  of  software.  The  reasons  for  the  change  of  one  experimental  approach  came  from 
an  unexpected  increase  in  the  service  cost  for  PacBio  Iso-seq.  After  the  award  for  the  project 
has  been  decided,  a  new  and  improved  version  of  PacBio  Iso-seq  was  only  available  with  a 
significant  increase  in  the  cost  of  sequencing.  The  price  increased  more  than  twice  than 
originally  calculated.  Since  the  cost  for  RNA-seq  experiments  also  increased  with  a  newer 
version  of  technologies,  it  was  impossible  to  afford  both  profiling  methods  to  achieve  the  goals 
of  the  specific  aim  1.  Therefore,  approaches  including  alternative  sequencing  method  along  with 
the  development  of  bioinformatics  tool  to  integrate  two  sequencing  data  have  been  taken. 


Actual  or  anticipated  problems  or  delays  and  actions  or  plans  to  resolve  them 

An  actual  problem  of  increasing  sequencing  cost  happened.  To  overcome  this  problem,  we 
conducted  3’-end  seq  which  will  catalog  transcriptome-wide  polyadenylation  sites  in  mRNAs. 
Then  we  developed  an  algorithm  that  applies  the  positional  information  of  poly(A)  sites  from  3’- 
end  seq  data  to  RNA-seq  data  and  quantitate  the  alternative  poly(A)  isoform  expression.  From 
this  approach,  we  were  able  to  analyze  the  expression  of  Torin  1 -responsive  truncated  mRNAs. 
The  3’-end  seq  experiments  and  the  development  of  a  new  software  could  bring  down  the  cost 
of  sequencing  (thus  match  the  budget)  and  still  achieve  the  goals  suggested  in  the  grant 
proposal. 


Changes  that  had  a  significant  impact  on  expenditures 

Nothing  to  report. 

Significant  changes  in  use  or  care  of  human  subjects,  vertebrate  animals,  biohazards, 
and/or  select  agents 

Nothing  to  report. 

Significant  changes  in  use  or  care  of  human  subjects 

Nothing  to  report. 

Significant  changes  in  use  or  care  of  vertebrate  animals. 

Nothing  to  report. 

Significant  changes  in  use  of  biohazards  and/or  select  agents 

Nothing  to  report. 
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6.  PRODUCTS 

Publications,  conference  papers,  and  presentations 

Yeh  HS,  Zhang  W,  Yong  J.  Analyses  of  alternative  polyadenylation:  from  old  school 
biochemistry  to  high-throughput  technologies.  BMB  reports.  2017;  50(4):201-207.  PMID: 
28148393,  PMCID:  PMC5437964 


Website(s)  or  other  Internet  site(s) 

Nothing  to  report. 

Technologies  or  techniques 

Nothing  to  report. 

Inventions,  patent  applications,  and/or  licenses 

Nothing  to  report. 


Other  Products 

Software: 

IntMAP  (Inte grative  Model  for  Alternative  Polyadenylation)  to  identify  alternative  polyadenylated 
mRNAs  genome-wide  has  been  developed.  The  source  code  for  the  software  will  be  open  to 
public  once  the  manuscript  is  published  in  a  scientific  journal.  In  addition,  a  user-friendly  free 
software  package  will  be  prepared  for  easy  download  and  distribution. 

Data: 

RNA-seq  and  3’-end  seq  data  reported  in  this  annual  report  will  be  deposited  to  public  data 
depository  once  the  manuscript  is  published. 

Database: 

In  silico  proteogenomics  database  for  genome-wide  intron-coded  peptide  sequences  will  be 
publicly  available  once  the  manuscript  is  published. 


10 
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Co-1 
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Project: 
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Nearest  person 
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month  worked: 
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Dr.  Park  collaborated 
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Project: 
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an  algorithm  and 
software  for  the 
detection  of  truncated 
mRNAs. 
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6 
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month  worked: 
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Dr.  Chang  worked  on 
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